Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Do Thesauri enhance rule-based categorization for OCR text?

Identifieur interne : 001799 ( Main/Exploration ); précédent : 001798; suivant : 001800

Do Thesauri enhance rule-based categorization for OCR text?

Auteurs : Kazem Taghva [États-Unis] ; Jeffrey Coombs [États-Unis]

Source :

RBID : Pascal:03-0421336

Descripteurs français

English descriptors

Abstract

A rule-based automatic text categorizer was tested to see if two types of thesaurus expansion, called query expansion and Junker expansion respectively, would improve categorization. Thesauri used were domain-specific to an OCR (Optical Character Recognition) test collection focussed on a single topic. Results show that neither type of expansion significantly improved categorization.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Do Thesauri enhance rule-based categorization for OCR text?</title>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">03-0421336</idno>
<date when="2003">2003</date>
<idno type="stanalyst">PASCAL 03-0421336 INIST</idno>
<idno type="RBID">Pascal:03-0421336</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000597</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000194</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000563</idno>
<idno type="wicri:doubleKey">1017-2653:2003:Taghva K:do:thesauri:enhance</idno>
<idno type="wicri:Area/Main/Merge">001877</idno>
<idno type="wicri:Area/Main/Curation">001799</idno>
<idno type="wicri:Area/Main/Exploration">001799</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Do Thesauri enhance rule-based categorization for OCR text?</title>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint>
<date when="2003">2003</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Automatic classification</term>
<term>Categorization</term>
<term>Improvement</term>
<term>Optical character recognition</term>
<term>Performance evaluation</term>
<term>Query expansion</term>
<term>Thesaurus</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance optique caractère</term>
<term>Catégorisation</term>
<term>Classification automatique</term>
<term>Thesaurus</term>
<term>Amélioration</term>
<term>Evaluation performance</term>
<term>Règle</term>
<term>Junker (M.)</term>
<term>C-KANT (Clips Knowledge Acquisition eNgine for Text categorization)</term>
<term>Elargissement question</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">A rule-based automatic text categorizer was tested to see if two types of thesaurus expansion, called query expansion and Junker expansion respectively, would improve categorization. Thesauri used were domain-specific to an OCR (Optical Character Recognition) test collection focussed on a single topic. Results show that neither type of expansion significantly improved categorization.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Nevada</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Nevada">
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
</region>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001799 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001799 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:03-0421336
   |texte=   Do Thesauri enhance rule-based categorization for OCR text?
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024